Search CORE

11 research outputs found

Non-hierarchical Structures: How to Model and Index Overlaps?

Author: Bratsberg Svein Erik
Hasibi Faegheh
Publication venue
Publication date: 08/10/2016
Field of study

Overlap is a common phenomenon seen when structural components of a digital object are neither disjoint nor nested inside each other. Overlapping components resist reduction to a structural hierarchy, and tree-based indexing and query processing techniques cannot be used for them. Our solution to this data modeling problem is TGSA (Tree-like Graph for Structural Annotations), a novel extension of the XML data model for non-hierarchical structures. We introduce an algorithm for constructing TGSA from annotated documents; the algorithm can efficiently process non-hierarchical structures and is associated with formal proofs, ensuring that transformation of the document to the data model is valid. To enable high performance query analysis in large data repositories, we further introduce an extension of XML pre-post indexing for non-hierarchical structures, which can process both reachability and overlapping relationships.Comment: The paper has been accepted at the Balisage 2014 conferenc

arXiv.org e-Print Archive

CiteSeerX

Personal Entity, Concept, and Named Entity Linking in Conversations

Author: Hasibi Faegheh
Joko Hideaki
Publication venue
Publication date: 27/09/2023
Field of study

Building conversational agents that can have natural and knowledge-grounded interactions with humans requires understanding user utterances. Entity Linking (EL) is an effective and widely used method for understanding natural language text and connecting it to external knowledge. It is, however, shown that existing EL methods developed for annotating documents are suboptimal for conversations, where personal entities (e.g., "my cars") and concepts are essential for understanding user utterances. In this paper, we introduce a collection and a tool for entity linking in conversations. We collect EL annotations for 1327 conversational utterances, consisting of links to named entities, concepts, and personal entities. The dataset is used for training our toolkit for conversational entity linking, CREL. Unlike existing EL methods, CREL is developed to identify both named entities and concepts. It also utilizes coreference resolution techniques to identify personal entities and references to the explicit entity mentions in the conversations. We compare CREL with state-of-the-art techniques and show that it outperforms all existing baselines

arXiv.org e-Print Archive

Data Augmentation for Conversational AI

Author: Hasibi Faegheh
Kanoulas Evangelos
Soudani Heydar
Publication venue
Publication date: 09/09/2023
Field of study

Advancements in conversational systems have revolutionized information access, surpassing the limitations of single queries. However, developing dialogue systems requires a large amount of training data, which is a challenge in low-resource domains and languages. Traditional data collection methods like crowd-sourcing are labor-intensive and time-consuming, making them ineffective in this context. Data augmentation (DA) is an affective approach to alleviate the data scarcity problem in conversational systems. This tutorial provides a comprehensive and up-to-date overview of DA approaches in the context of conversational systems. It highlights recent advances in conversation augmentation, open domain and task-oriented conversation generation, and different paradigms of evaluating these models. We also discuss current challenges and future directions in order to help researchers and practitioners to further advance the field in this area

arXiv.org e-Print Archive

Conversational Entity Linking: Problem Definition and Datasets

Author: Balog Krisztian
de Vries Arjen
Hasibi Faegheh
Joko Hideaki
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2021
Field of study

Machine understanding of user utterances in conversational systems is of utmost importance for enabling engaging and meaningful conversations with users. Entity Linking (EL) is one of the means of text understanding, with proven efficacy for various downstream tasks in information retrieval. In this paper, we study entity linking for conversational systems. To develop a better understanding of what EL in a conversational setting entails, we analyze a large number of dialogues from existing conversational datasets and annotate references to concepts, named entities, and personal entities using crowdsourcing. Based on the annotated dialogues, we identify the main characteristics of conversational entity linking. Further, we report on the performance of traditional EL systems on our Conversational Entity Linking dataset, ConEL, and present an extension to these methods to better fit the conversational setting. The resources released with this paper include annotated datasets, detailed descriptions of crowdsourcing setups, as well as the annotations produced by various EL systems. These new resources allow for an investigation of how the role of entities in conversations is different from that in documents or isolated short text utterances like queries and tweets, and complement existing conversational datasets.publishedVersio

Radboud Repository

NORA - Norwegian Open Research Archives

UiS Brage

Find the Funding: Entity Linking with Incomplete Funding Knowledge Bases

Author: Aydin Gizem
Hasibi Faegheh
Tabatabaei Seyed Amin
Tsatsaronis Giorgios
Publication venue
Publication date: 20/09/2022
Field of study

Automatic extraction of funding information from academic articles adds significant value to industry and research communities, such as tracking research outcomes by funding organizations, profiling researchers and universities based on the received funding, and supporting open access policies. Two major challenges of identifying and linking funding entities are: (i) sparse graph structure of the Knowledge Base (KB), which makes the commonly used graph-based entity linking approaches suboptimal for the funding domain, (ii) missing entities in KB, which (unlike recent zero-shot approaches) requires marking entity mentions without KB entries as NIL. We propose an entity linking model that can perform NIL prediction and overcome data scarcity issues in a time and data-efficient manner. Our model builds on a transformer-based mention detection and bi-encoder model to perform entity linking. We show that our model outperforms strong existing baselines.Comment: Accepted at COLING 202

arXiv.org e-Print Archive

MMEAD: MS MARCO Entity Annotations and Disambiguations

Author: de Vries Arjen P.
Hasibi Faegheh
Kamphuis Chris
Lin Aileen
Lin Jimmy
Yang Siwen
Publication venue
Publication date: 14/09/2023
Field of study

MMEAD, or MS MARCO Entity Annotations and Disambiguations, is a resource for entity links for the MS MARCO datasets. We specify a format to store and share links for both document and passage collections of MS MARCO. Following this specification, we release entity links to Wikipedia for documents and passages in both MS MARCO collections (v1 and v2). Entity links have been produced by the REL and BLINK systems. MMEAD is an easy-to-install Python package, allowing users to load the link data and entity embeddings effortlessly. Using MMEAD takes only a few lines of code. Finally, we show how MMEAD can be used for IR research that uses entity information. We show how to improve recall@1000 and MRR@10 on more complex queries on the MS MARCO v1 passage dataset by using this resource. We also demonstrate how entity expansions can be used for interactive search applications

arXiv.org e-Print Archive

Semantic Search with Knowledge Bases

Author: Hasibi Faegheh
Publication venue: 'Norwegian University of Science and Technology (NTNU) Library'
Publication date: 01/01/2018
Field of study

Over the past decade, modern search engines have made significant progress towards better understanding searchers' intents and providing them with more focused answers, a paradigm that is called \semantic search." Semantic search is a broad area that encompasses a variety of tasks and has a core enabling data component, called the knowledge base. In this thesis, we utilize knowledge bases to address three tasks involved in semantic search: (i) query understanding, (ii) entity retrieval, and (iii) entity summarization. Query understanding is the first step in virtually every semantic search system. We study the problem of identifying entity mentions in queries and linking them to the corresponding entries in a knowledge base. We formulate this as the task of entity linking in queries, propose refinements to evaluation measures, and publish a test collection for training and evaluation purposes. We further establish a baseline method for this task through a reproducibility study, and introduce different methods with the aim to strike a balance between efficiency and effectiveness. Next, we turn to using the obtained annotations for answering the queries. Here, our focus is on the entity retrieval task: answering search queries by returning a ranked list of entities. We introduce a general feature-based model based on Markov Random Fields, and show improvements over existing baseline methods. We find that the largest gains are achieved for complex natural language queries. Having generated an answer to the query (from the entity retrieval step), we move on to presentation aspects of the results. We introduce and address the novel problem of dynamic entity summarization for entity cards, by breaking it into two subtasks, fact ranking and summary generation. We perform an extensive evaluation of our method using crowdsourcing, and show that our supervised fact ranking method brings substantial improvements over the most comparable baselines. In this thesis, we take the reproducibility of our research very seriously. Therefore, all resources developed within the course of this work are made publicly available. We further make two major software and resource contributions: (i) the Nordlys toolkit, which implements a range of methods for semantic search, and (ii) the extended DBpedia-Entity test collection

NORA - Norwegian Open Research Archives

Gridvoronoi: An Efficient Spatial Index for Nearest Neighbor Query Processing

Author: Almpanidis George
Fan Gaojuan
Hasibi Faegheh
Zhang Chongsheng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Contains fulltext : 208505.pdf (publisher's version ) (Open Access

Radboud Repository